Keir Fraser [Fri, 4 Dec 2009 07:11:44 +0000 (07:11 +0000)]
libxenlight: get state for one domain
Simple function to get the dominfo state of a single domain.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:11:06 +0000 (07:11 +0000)]
libxenlight: domain resume
Added libxenlight implementation for resume domain.
This brings back a cooperative pv domain from the
shutdown state after save, enabling checkpointing.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:10:22 +0000 (07:10 +0000)]
libxenlight: Destroy device model only for domains that have it
Destroy device model only for domains that have it.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:09:44 +0000 (07:09 +0000)]
libxenlight: avoid writing empty values to xenstore
Prevent segmentation fault caused by empty values
in key-value pairs for the /vm/ subdirectory
when restoring a pv domain.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:06:47 +0000 (07:06 +0000)]
libxenlight: disk and nic destroy calls
Expose disk and nic device destroy calls
Also removes the obsolete device shutdown calls.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:03:45 +0000 (07:03 +0000)]
libxenlight: refactor libxl destroy code
Refactor libxl device destroy code. Abstract function
waiting for the watch on the state node to fire.
Create a generic device delete function.
Only a single LIBXL_DESTROY_TIMEOUT elapses when
waiting for destruction of all the devices of a
domain.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:02:49 +0000 (07:02 +0000)]
libxenlight: fix GC when cloning contexts
Provide a function to clone a context. This is necessary
because simply copying the structs will eventually
corrup the GC: maxsize is updated in the cloned context
but not in the originating, yet they have the same array
of referenced pointers alloc_ptrs.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:00:25 +0000 (07:00 +0000)]
xend: Fix parameters to PyArg_ParseTupleAndKeywords()
The kwd_list parameter PyArg_ParseTupleAndKeywords() must be a
NULL-terminated list.
Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
Keir Fraser [Fri, 4 Dec 2009 06:59:33 +0000 (06:59 +0000)]
x86: XENMEM_add_to_physmap should propagate errors from guest_physmap_add_page().
Authored-by: David Lively
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Keir Fraser [Fri, 4 Dec 2009 06:58:08 +0000 (06:58 +0000)]
Add keyhandler 'g' to print all active grant table entries.
Authored-By: Robert Phillips
Signed-off-By: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Keir Fraser [Fri, 4 Dec 2009 06:51:53 +0000 (06:51 +0000)]
libxenlight: Get rid of the dependency on the LIBCONFIG_SOURCE directory.
Signed-off-by: Jean Guyader <jean.guyader@eu.citrix.com>
Keir Fraser [Fri, 4 Dec 2009 06:50:46 +0000 (06:50 +0000)]
libxenlight: Delete dep files on 'make clean', and include them in Makefile rules.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 3 Dec 2009 13:52:02 +0000 (13:52 +0000)]
grant-tables: do not fail attempts to GNTTABOP_set_version to the current version.
...even if there are active grants.
This triggers when checkpoint a guest which essentially resumes
without actually having gone through the suspend so the domain is
already latched to v2 inside Xen.
Also return the current actual version on success and failure. Not
terribly useful with only 2 options but is more robust to future
developments.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Thu, 3 Dec 2009 13:51:20 +0000 (13:51 +0000)]
xend: Add GPL license stanza to MemoryPool.py
Signed-off-by: James Song (Wei) <jsong@novell.com>
Keir Fraser [Thu, 3 Dec 2009 13:50:43 +0000 (13:50 +0000)]
Remus: fall back to xenstore if necessary
This is primarily for pvops until it gets a dedicated suspend
event channel.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Thu, 3 Dec 2009 13:50:14 +0000 (13:50 +0000)]
Remus: fix shadow memory allocation, broken by 20558:
4ed3b9b1de3f
This approach is perhaps a little cleaner than directly calling
balloon.free.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Wed, 2 Dec 2009 18:46:14 +0000 (18:46 +0000)]
x86 hvm: fix up the unified HAP nested-pagefault handler.
A guest PFN may have been marked dirty and switched to p2m_ram_rw by
another CPU between the VMEXIT and lookup in this handler, so
we can't just check for p2m_ram_logdirty. Also, handle_mmio
doesn't handle passthrough MMIO.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Wed, 2 Dec 2009 18:43:28 +0000 (18:43 +0000)]
xentop: Allow full domain name display
Add a '-f' option to xentop to allow the full domain name to be
displayed. This is the original behavior which can cause the display
to be unaligned. Customers have requested this because only the
trailing characters of their domain names are unique and therefore
cannot be distinguished when the display is limited to a 10 character
width.
Signed-off-by: Charles Arnold <carnold@novell.com>
Keir Fraser [Wed, 2 Dec 2009 18:42:36 +0000 (18:42 +0000)]
libxenlight: fix multiple xenstore watches problem
this patch fixes the multiple xenstore watches problem in libxenlight
opening a new xenstore connection to set and read temporary watches on
the device state nodes. This way they don't interfere with other long
running watches.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 18:42:03 +0000 (18:42 +0000)]
libxenlight: use watch and select in libxl_wait_for_device_model
This patch reimplements libxl_wait_for_device_model using a xenstore
watch and a select loop.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 18:41:31 +0000 (18:41 +0000)]
libxenlight: fix dm_xenstore_record_pid
The function dm_xenstore_record_pid is executed by a child of the main
process and therefore shouldn't use the same xenstore connection:
currently it opens a new connection but still uses the old one.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 13:45:35 +0000 (13:45 +0000)]
xenstat: Fixes for 20528:
e6e3bf767d16 (stats for dom0 network bonding)
In above c/s I introduced dom0 statistics for case we use network
bonding. The indentation was not good for xenstat C codebase and also
some modifications were done to the logic, mainly not using the parsed
variables we don't care about (as we care only about
{tx|rx}{bytes,packets,errs,drops} and no other variable from
/proc/net/dev) by passing NULLs to variables we don't care about. Also
dom0 statistics alteration was fixed to include {tx|rx}{drop,errs} for
dom0 (previous version of my patch was not having this code applied).
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Keir Fraser [Wed, 2 Dec 2009 13:43:37 +0000 (13:43 +0000)]
xend, vt-d: do not reserve vtd_mem if iommu is not enabled
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Wed, 2 Dec 2009 13:39:07 +0000 (13:39 +0000)]
vmx: During task-switch, read instr-len VMCS field only when valid.
Otherwise we can crash on the BUG_ON() in __get_instruction_length().
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 2 Dec 2009 08:52:50 +0000 (08:52 +0000)]
VT-d: Fix indentation to make log messages more readable in dmar.c
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Wed, 2 Dec 2009 08:51:59 +0000 (08:51 +0000)]
pci: Correct BDF format from B:D:F to B:D.F in log messages.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Wed, 2 Dec 2009 08:51:12 +0000 (08:51 +0000)]
xend: Memory pool for pv guest on systems with >128G memory
The main idea of this patch is:
1) The admin sets aside some memory below 128G for 32-bit paravirtual
domain creation (via dom0_mem=-<value> in kernel comand line).
2) The admin also explicitly states to the tools (i..e xend) how much
memory is supposed to be left untouched by 64-bit domains
3) If a 32-bit pv DomU gets created, no ballooning ought to be
necessary (since if it is, no guarantee can be made about the address
range of the memory ballooned out), and memory gets allocated from the
reserved range.
4) Upon 64-bit (or 32-bit HVM or HVM) DomU creation, the tools
determine the amount of memory to be ballooned out of Dom0 by adding
the amount needed for the new guest and the amount still in the
reserved pool (and then of course subtracting the total amount of
memory the hypervisor has available for guest use).
Signed-off-by: james song (wei) <jsong@novell.com>
Keir Fraser [Wed, 2 Dec 2009 08:48:36 +0000 (08:48 +0000)]
VT-d: get rid of hardcode in iommu_flush_cache_entry
Currently iommu_flush_cache_entry uses a fixed size 8 bytes to flush
cache. But it also needs to flush caches with different sizes,
e.g. struct root_entry is 16 bytes. This patch fixes the hardcode by
using a parameter "size" to flush caches with different sizes.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Wed, 2 Dec 2009 08:47:49 +0000 (08:47 +0000)]
xm: fix message in OptionError deprecated since Python 2.6
BaseException.message has been deprecated since Python 2.6. To
prevent DeprecationWarning from popping up over this pre-existing
attribute, use a new property that takes lookup precedence.
Signed-off-by: Wei Kong <weikong.cn@gmail.com>
Keir Fraser [Wed, 2 Dec 2009 08:46:47 +0000 (08:46 +0000)]
docs: new tsc_mode VM configuration option
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Wed, 2 Dec 2009 08:46:11 +0000 (08:46 +0000)]
remus: Skip Linux-specific build components on other OSes
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Wed, 2 Dec 2009 08:45:16 +0000 (08:45 +0000)]
libxenlight: write stubdoms logs to file
It turns out that there is a better way to write stubdoms logs to file
than using libxl_console_attach: qemu is the one that provides the
console backend for stubdoms and qemu is able to redirect a serial to
file, so we can use this feature to make sure the first stubdom
console is always redirected to a logfile.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 08:44:40 +0000 (08:44 +0000)]
libxenlight: two small fixes
- set the domid of the guest and not the one of the stubdom in the
libxl_device_model_starting returned to the user;
- check that the length of the two strings matches in
libxl_name_to_domid, otherwise we can get a match for two different
domains that have the same initial part of the name.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 08:44:10 +0000 (08:44 +0000)]
libxl: include signal.h, required for SIGKILL definition
...makes libxl build on NetBSD.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Tue, 1 Dec 2009 14:19:28 +0000 (14:19 +0000)]
x86: Correctly allocate module-relocation area and bzimage headroom.
Without this patch, loading a bzimage dom0 kernel while also
requesting a dynamically-allocated crashkernel area is broken.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 1 Dec 2009 14:08:27 +0000 (14:08 +0000)]
hvmloader: Fix bug in 20510:
749b5d46e7a9 (GPE notifications)
The GPE notification decision tree was inverted.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 1 Dec 2009 14:03:42 +0000 (14:03 +0000)]
libxenlight: wait for pv qemu initialization
this patch makes libxl_create_stubdom wait for pv qemu to be properly
initialized before unpausing the stubdom.
A new libxl_device_model_starting pointer is used to wait for pv qemu
initialization while the libxl_device_model_starting pointer given by
the user is initialized to a new structure with an empty for_spawn
member, because nothing that was spawn has to be waited for anymore.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 14:02:00 +0000 (14:02 +0000)]
x86: fix MCE/NMI injection
This attempts to address all the concerns raised in
http://lists.xensource.com/archives/html/xen-devel/2009-11/msg01195.html,
but I'm nevertheless still not convinced that all aspects of the
injection handling really work reliably. In particular, while the
patch here on top of the fixes for the problems menioned in the
referenced mail also adds code to keep send_guest_trap() from
injecting multiple events at a time, I don't think the is the right
mechanism - it should be possible to handle NMI/MCE nested within
each other.
Another fix on top of the ones for the earlier described problems is
that the vCPU affinity restore logic didn't account for software
injected NMIs - these never set cpu_affinity_tmp, but due to it most
likely being different from cpu_affinity it would have got restored
(to a potentially random value) nevertheless.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Tue, 1 Dec 2009 13:59:47 +0000 (13:59 +0000)]
xen: turn numa=on by default
I did some benchmark runs (lmbench & kernel compile) with a number of
guests running in parallel to compare the performance of numa=on vs.
numa=off. As soon as one starts to load the machine, the performance
goes down in the numa=off case. The tests were done on an 8-node
machine (4 cores each). lmbench (actually copying large amounts of
memory) shows a dramatic dropdown, but I even noticed significant
performance decrease for a tmpfs based Linux kernel compile. Here a
summary of the data:
lmbench's rd benchmark (normalized to native Linux (=100)):
guests numa=off numa=on avg increase
min avg max min avg max
1 78.0 102.3
7 37.4 45.6 62.0 90.6 102.3 110.9 124.4%
15 21.0 25.8 31.7 41.7 48.7 54.1 88.2%
23 13.4 17.5 23.2 25.0 28.0 30.1 60.2%
kernel compile in tmpfs, 1 VCPU, 2GB RAM, average of elapsed time:
guests numa=off numa=on increase
1 480.610 464.320 3.4%
7 482.109 461.721 4.2%
15 515.297 477.669 7.3%
23 548.427 495.180 9.7%
again with 2 VCPUs and make -j2:
1 264.580 261.690 1.1%
7 279.763 258.907 7.7%
15 330.385 272.762 17.4%
23 463.510 390.547 15.7% (46 VCPUs on 32pCPUs)
Selected tests on a 4-node machine showed similar behavior (7.9 %
increase with 6 parallel guests on the 2 VCPU kernel compile
benchmark).
Note that this does not affect non-NUMA machines at all, since NUMA
will be turned off again by the code if no NUMA topology is detected.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Keir Fraser [Tue, 1 Dec 2009 13:57:02 +0000 (13:57 +0000)]
libxc: pass the restore_context through function and allocate the context on the restore function stack.
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:56:26 +0000 (13:56 +0000)]
libxc: pass the suspend_context through function and allocate the context on the save function stack.
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:55:50 +0000 (13:55 +0000)]
libxc: move the domain_info_context into the restore_context
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:55:15 +0000 (13:55 +0000)]
libxc: move domain_info_context into the save_context
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:54:36 +0000 (13:54 +0000)]
libxc: move restore global variable to a global static context
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:54:01 +0000 (13:54 +0000)]
libxc: create a global context structure to record global variables in save
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:53:14 +0000 (13:53 +0000)]
libxc: create a domain_info_context structure to store guest_width and p2m_size for macros.
Macro now refers to guest_width and p2m_size through a dinfo pointer.
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:49:33 +0000 (13:49 +0000)]
libxenlight: enables less than maximum vcpus
Enable turning on a different amount of vcpus than
the maximum during domain creation/restore.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Tue, 1 Dec 2009 13:48:48 +0000 (13:48 +0000)]
libxenlight: allow domain to publish its suspend evtchn
Allow domain to publish its suspend event channel.
Otherwise, the fast event-channel-based suspend
path is disabled.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Tue, 1 Dec 2009 13:48:03 +0000 (13:48 +0000)]
libxenlight: write vcpu availability paths in xenstore
Write cpu availability paths to xenstore. Otherwise,
no vcpus other than the first are enabled.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Tue, 1 Dec 2009 13:47:18 +0000 (13:47 +0000)]
libxenlight: remove vss and xapi patch on domain destroy
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Tue, 1 Dec 2009 13:46:31 +0000 (13:46 +0000)]
libxenlight: set domain handle
Set domain handle much like xend does, identical to
the uuid. This allows obtaining the uuid of a domain
from the handle in the dominfo struct.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Tue, 1 Dec 2009 13:45:45 +0000 (13:45 +0000)]
libxenlight: fix uuid code
- Use proper constants
- Use functions from the uuid library
- Fix broken pointer handling in libxl_dominfo
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Tue, 1 Dec 2009 13:44:13 +0000 (13:44 +0000)]
libxenlight: avoid writing empty values to xenstore
Prevent segmentation fault caused by empty values
in key-value pairs for the /vm/ subdirectory
when creating a pv domain.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Tue, 1 Dec 2009 13:41:38 +0000 (13:41 +0000)]
sysctl: Fix mis-allocation of number for XEN_SYSCTL_lockprof_op
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:39:51 +0000 (13:39 +0000)]
Revert 20523:
bd52fff29e6e "Remove redundant tests in __start_xen()"
Consensus is that code is clearer with the tests, even though they are
redundant.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:38:18 +0000 (13:38 +0000)]
xentop: Add tmem-freeable info when tmem is active
(No change to xentop output when tmem is inactive.)
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Tue, 1 Dec 2009 13:37:20 +0000 (13:37 +0000)]
xenstat: Linux dom0 statistics for case we use network bonding
I've created a patch that alters dom0 statistics (if empty like in
case of network bonding) and puts network bridge statistics
instead. It's been tested with network bonding both enabled and
disabled and also by creating a standalone network bridge without
bonding... It was working fine in all my tests...
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Keir Fraser [Tue, 1 Dec 2009 13:36:22 +0000 (13:36 +0000)]
Report hardware tsc frequency even for emulated tsc
I was starting some documentation for tsc_mode and
realized this discussion was never resolved. Currently
when TSC is emulated the pvclock algorithm reports
to a PV OS Xen's system clock hz rate (1GHz). Linux
at boottime samples the TSC rate and shows it in
dmesg and the rate is also shown in the "cpu MHz"
field in /proc/cpuinfo. So when TSC is emulated,
it appears that the processor MHz is 1000.0, which
is likely to be confusing to many Xen users.
This patch changes the reported hz rate to the
hz rate of the initial machine on which the guest
is booted and retains that reported hz rate across
save/restore/migration.
Jeremy has pointed out that reporting 1000.0 MHz is
useful because it shows that TSC is being emulated.
However, with the new tsc_mode default where
a guest may start with native TSC and switch to
emulated TSC after migration, users are likely to
get even more confused. And "xm debug-key s"
reveals not only whether TSC is being emulated but
also the frequency so is more descriptive anyway.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Tue, 1 Dec 2009 13:35:28 +0000 (13:35 +0000)]
tools: avoid cpu over-commitment if numa=on
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Keir Fraser [Tue, 1 Dec 2009 13:34:38 +0000 (13:34 +0000)]
libxenlight: fix segfault when reading blktap2 devs
This patch fixes a possible segfault when reading from
/sys/class/blktap2/devices, if the line read is empty.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:34:10 +0000 (13:34 +0000)]
libxenlight: fix multiple console with stubdoms
libxenlight doesn't handle properly the multiple pv console case,
needed to support an emulated serial in hvm guests with stubdoms.
This patch fixes it.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Mon, 30 Nov 2009 11:48:36 +0000 (11:48 +0000)]
x86: Remove redundant tests in __start_xen()
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Keir Fraser [Mon, 30 Nov 2009 10:58:23 +0000 (10:58 +0000)]
ia64: eliminate build warnings
Various warnings appeared since 3.4 - eliminate at least some of them.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Mon, 30 Nov 2009 10:57:42 +0000 (10:57 +0000)]
xend: fix bugs in c/s 20321:
7a69f773548e "add a config description item for each guest"
Signed-off-by: james song (wei)<jsong@novell.com>
Keir Fraser [Mon, 30 Nov 2009 10:54:20 +0000 (10:54 +0000)]
libxenlight: implement blktap2 support
This patch implements blktap2 support in libxenlight; blktap2 is only
enabled if it is actually supported by the host, otherwise we fall
back to the previous code. Also for the moment we pretend that disk
type file is actually tap:aio.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Mon, 30 Nov 2009 10:53:39 +0000 (10:53 +0000)]
libxenlight: fix suspend/resume
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Mon, 30 Nov 2009 10:47:36 +0000 (10:47 +0000)]
libxenlight: add console command
This patch adds "xl console" command similar to "xm console".
Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Mon, 30 Nov 2009 10:41:28 +0000 (10:41 +0000)]
libxenlight: fix hvm flag when no hvmloader
Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Mon, 30 Nov 2009 10:38:58 +0000 (10:38 +0000)]
x86/mm: set_p2m_entry() should return 0 on error
set_p2m_entry() ignores halfway errors.
It should return 0 on error.
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Fri, 27 Nov 2009 08:09:26 +0000 (08:09 +0000)]
xm: Allow detaching vif by MAC address
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Fri, 27 Nov 2009 08:05:18 +0000 (08:05 +0000)]
VT-d: Free unused interrupt remapping table entry
This patch changes the IRTE allocation method, and frees unused
IRTE when device is de-assigned.
Signed-Off-By: Zhai Edwin <edwin.zhai@intel.com>
Keir Fraser [Fri, 27 Nov 2009 07:56:38 +0000 (07:56 +0000)]
build: Execute mk_dsdt with path
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Thu, 26 Nov 2009 15:27:00 +0000 (15:27 +0000)]
hvmloader: Auto-generate IRQ routing tables in ACPI DSDT.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 26 Nov 2009 14:49:40 +0000 (14:49 +0000)]
libxenlight: implement pause and unpause
this patch adds domain pause and unpause commands to xl, implementing
them using the already exiting functions libxl_domain_pause and
libxl_domain_unpause.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Thu, 26 Nov 2009 13:51:16 +0000 (13:51 +0000)]
hvmloader: Auto-generate the lengthy pattern-based sections of ACPI DSDT.
At the same time, replace a lengthy linear GPE notification method,
with a logarithmic binary chop. Based on a patch by Simon Horman.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 26 Nov 2009 11:35:27 +0000 (11:35 +0000)]
x86: Remove redundant logic for mp_register gsi.
For xen's irq and gsi, they are identity mapped, and doesn't
need to record the irq and gsi mapping in this array, in addition
the mapping maybe not correct, since dom0 may not figure the GSI
from 16 on.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Keir Fraser [Thu, 26 Nov 2009 11:31:16 +0000 (11:31 +0000)]
x86 shadow: don't try to unsshadow for p2m changes after the shadows
have been torn down.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Thu, 26 Nov 2009 11:30:42 +0000 (11:30 +0000)]
Revert 20505:
44ea369eefc1
Keir Fraser [Thu, 26 Nov 2009 11:24:50 +0000 (11:24 +0000)]
x86: Always respect guest setting CR4.TSD
Also fix guest reads of CR4.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 26 Nov 2009 11:02:30 +0000 (11:02 +0000)]
x86 shadow: fix race when domain is dying
There are some cases that shadow_write_p2m_entry() is called after
the domain is killed. It causes Xen to crash.
- Race between xc_map_foreign_batch from qemu-dm and "xm destroy"
command.
- The hypervisor calls domain_crash when PoD fails.
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Keir Fraser [Thu, 26 Nov 2009 11:00:49 +0000 (11:00 +0000)]
Implement rdtscp emulation and rdtscp_aux "support"
The rdtscp instruction (and the associated TSC_AUX
msr) are present on most recent AMD processors,
and on the Nehalem and future Intel processors.
Cpuid has a bit to detect the presence of this feature.
Xen intentionally does not expose the cpuid rdtscp bit
to PV OS's or to HVM guests, but PV apps can see this
bit and, as a result, may choose to use the rdtscp
instruction. When a PV guest with such an app is migrated
to a machine that does not have rdtscp support, the
app will get killed due to an invalid op. Fix this
by emulating the rdtscp instruction. We also need
to emulate rdtscp in the case where the machine has
rdtscp support, but rdtsc emulation is enabled (which
is unfortunately a different path: a privileged op).
The rdtscp instruction reads the TSC_AUX register which
presumably is set by the OS (and, in the case of
tsc_mode==pvrdtscp, will be set by Xen). HV Linux
and PV Linux will not set TSC_AUX because the
cpuid rdtscp bit is not propogated by Xen; I'm told that
Windows always sets TSC_AUX to zero. So for PV guests
running on rdtscp-capable hardware (that don't use
tsc_mode==pvrdtscp), always set TSC_AUX to zero.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Thu, 26 Nov 2009 11:00:15 +0000 (11:00 +0000)]
libxc: Fix 32-vs-64 bitness issue in saving vcpu contexts in core dump
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 26 Nov 2009 10:57:26 +0000 (10:57 +0000)]
xm: Fix maxvcpus support
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Keir Fraser [Thu, 26 Nov 2009 10:56:49 +0000 (10:56 +0000)]
xend: little fix for tap
Need get dev type after create tap device as device_create did.
Signed-off-by: Wei Kong <weikong.cn@gmail.com>
Keir Fraser [Wed, 25 Nov 2009 14:19:50 +0000 (14:19 +0000)]
libxenlight: move logging macros to the public header
This patch moves the logging macros to the public header so that they
can be reused by the client of the library. It also refactors the
code to create the qemu logfile into a generic function that can be
reused to create generic xen logfiles under /var/log/xen. Finally xl
is changed to log to file when running in background.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 25 Nov 2009 14:19:20 +0000 (14:19 +0000)]
libxenlight: clean up the domain when it dies
This patch adds two functions to libxenlight to be able to recognize
when a particular domain dies. After creating a domain, xl uses these
functions to wait for its death and clean up its resources.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 25 Nov 2009 14:15:57 +0000 (14:15 +0000)]
x86 time: Fix build and clean up.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 25 Nov 2009 14:12:58 +0000 (14:12 +0000)]
x86 hpet: Do nothing in hpet_broadcast_exit() if no timer deadline.
From: "Jiang, Yunhong" <yunhong.jiang@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 25 Nov 2009 14:11:37 +0000 (14:11 +0000)]
libxenlight: implement stubdom support
this patch implements stubdom support for libxenlight:
- it adds two functions to find the stubdom domid of a domain and to
figure out if a certain domain is actually a stubdom;
- it moves all the device init functions from xl.c to libxl.c because
they are needed to setup the devices of stubdoms;
- it fixes some bugs in the pci setup that prevented pci passthrough
from working correctly with stubdoms.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 25 Nov 2009 14:11:02 +0000 (14:11 +0000)]
xm: Add maxvcpus support
this is patch to add maxvcpus support to xen xm command. It's using
vcpu_avail bitmask and sets the number of vcpus to maxvcpus if
present. If it's not present, old behavior is preserved.
In domain config file you can define it as follows:
maxvcpus = 4
vcpus = 2
this automatically sets vcpus to 4 and corresponding bitmask to
present 2 vcpus in the guest with option to increase it up to 4
vcpus. If maxvcpus is not present, the old behavior for vcpus is
preserved, ie. you can set vcpus to some number of vcpus to be used
and the vcpu_avail is set appropriately to use all of them. Only when
you use maxvcpus and vcpus new vcpu_avail value is calculated to show
PV guest the desired number of vcpus only.
It's been tested using RHEL-5 32-bit PV guest with maxvcpus = 4 and
vcpus = 2 and also the previous setup of vcpus = 2 only... In both
cases I was able to use 'xm vcpu-set {domainId} {numberOfVCPUs}' to
increase move vcpu count from 0 to maxvcpus/vcpus so it was working as
designed.
Signed-off-By: Michal Novotny<minovotn@redhat.com>
Keir Fraser [Wed, 25 Nov 2009 14:06:17 +0000 (14:06 +0000)]
cpuidle: Add decaying history logic to menu idle predictor
this patch is ported from linux upstream git commit
816bb611e41be29b476dc16f6297eb551bf4d747
the original description is:
"
Add decaying history of predicted idle time, instead of using the last
early wakeup. This logic helps menu governor do better job of
predicting idle time.
With this change, we also measured noticable (~8%) power savings on a
DP server system with CPUs supporting deep C states, when system was
lightly loaded. There was no change to power or perf on other load
conditions.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
"
In Xen environment, we also observe this patch reduce the idle power
fluctuation. In one DP server, when system is purely idle, the watts
stdev/average reduce from 6% to 2%. it is helpful for idle power
measurement accuracy. There is no performance and power change when
system is loaded.
Signed-off-by: Yu Ke <ke.yu@intel.com>
Keir Fraser [Wed, 25 Nov 2009 14:05:28 +0000 (14:05 +0000)]
Replace tsc_native config option with tsc_mode config option
(NOTE: pvrdtscp mode not finished yet, but all other
modes have been tested so sooner seemed better than
later to submit this fairly major patch so we can get
more mileage on it before next release.)
New tsc_mode config option supercedes tsc_native and
offers a more intelligent default and an additional
option for intelligent apps running on PV domains
("pvrdtscp").
For PV domains, default mode will determine if the initial
host has a "safe"** TSC (meaning it is always synchronized
across all physical CPUs). If so, all domains will
execute all rdtsc instructions natively; if not,
all domains will emulate all rdtsc instructions but
providing the TSC hertz rate of the initial machine.
After being restored or live-migrated, all PV domains will
emulate all rdtsc instructions. Hence, this default mode
guarantees correctness while providing native performance
in most conditions.
For PV domains, tsc_mode==1 will always emulate rdtsc
and tsc_mode==2 will never emulate rdtsc. For tsc_mode==3,
rdtsc will never be emulated, but information is provided
through pvcpuid instructions and rdtscp instructions
so that an app can obtain "safe" pvclock-like TSC information
across save/restore and live migration. (Will be completed in
a follow-on patch.)
For HVM domains, the default mode and "always emulate"
mode do the same as tsc_native==0; the other two modes
do the same as tsc_native==1. (HVM domains since 3.4
have implemented a tsc_mode=default-like functionality,
but also can preserve native TSC across save/restore
and live-migration IFF the initial and target machines
have a common TSC cycle rate.)
** All newer AMD machines, and Nehalem and future Intel
machines have "Invariant TSC"; many newer Intel machines
have "Constant TSC" and do not support deep-C sleep states;
these and all single-processor machines are "safe".
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Wed, 25 Nov 2009 14:04:46 +0000 (14:04 +0000)]
hvmloader: Advertise ECC memory in SMBIOS tables.
Microsoft's Windows logo certified hardware requires ECC; since the
SVVP certification runs the same test on the guest, Xen domains will
currently fail it.
From: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 24 Nov 2009 14:43:07 +0000 (14:43 +0000)]
x86: Add a new physdev_op PHYSDEVOP_setup_gsi for GSI setup.
GSI 0-15 is setup by hypervisor, and GSI > =16 is setup by dom0
this physdev_op PHYSDEVOP_setup_gsi. This patch can help dom0
to get rid of intrusive changes of ioapic.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Keir Fraser [Tue, 24 Nov 2009 14:38:37 +0000 (14:38 +0000)]
tmem: fix freeable memory accounting error
Fix tmem accounting error that causes an "apparent"
memory leak, creating false negatives when testing
memory availability for launching a new domain.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Tue, 24 Nov 2009 14:37:59 +0000 (14:37 +0000)]
tmem: Fix another race in tmem on domain destroy.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Mon, 23 Nov 2009 15:19:38 +0000 (15:19 +0000)]
Revert 20457:
1bbc132675a2
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 23 Nov 2009 08:06:54 +0000 (08:06 +0000)]
pygrub: add basic support for parsing grub2 style grub.cfg file
This represents a very simplistic aproach to parsing these file. It
is basically sufficient to parse the files produced by Debian
Squeeze's version of update-grub. The actual grub.cfg syntax is much
more expresive but not apparently documented apart from a few
examples...
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Mon, 23 Nov 2009 08:06:19 +0000 (08:06 +0000)]
pygrub: track the title of an item as an independant field
separate to the other fields.
This makes the list of lines within a GrubImage 0 based rather than 1
based therefore adjust the user interface parts to suit.
This is in preparation for grub2 support where the syntax for the item
title does not fit the existing usage.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Mon, 23 Nov 2009 08:05:49 +0000 (08:05 +0000)]
pygrub: factor generic Grub functionality into GrubConf base classes
and inherit from these classes to implement Grub-legacy functionality.
Use a tuple of (parser-object,configuration-file) in pygrub to allow
for multiple parsers.
Makes way for grub2 support.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>